Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM be careful, EP only will fail if you devide by default with tp size the head dim
| # Account for TP: each KV head is dispatched to a different GPU, so the effective number of KV heads per GPU is | ||
| # simply divided by the TP size (number of GPUs) | ||
| if tp_size is not None and tp_size > 1: |
There was a problem hiding this comment.
only if the attention k and v are target of the tp plan tho
There was a problem hiding this comment.
What could be the other targets? Not familiar enough with the TP plan tbh
There was a problem hiding this comment.
Ok, added a boolean kv_is_tp = "layers.*.self_attn.k_proj" in config.tp_plan and "layers.*.self_attn.v_proj" in config.tp_plan to condition this.
| logit_processor: The [`ContinuousBatchingLogitsProcessorList`] object used to process the logits. | ||
| input_queue: Queue for incoming requests | ||
| input_queue: Queue for incoming requests. Is None if this process is not a TP driver. | ||
| cancel_queue: Queue for cancellation request_ids. Is None if this process is not a TP driver. |
There was a problem hiding this comment.
Okay will read the rest to see if all are drivers or not
* TP heads and DP / TP seeds * Reproducible hash * Add the notion of TP drivers * Fix NCCL device * Temporary fix for multiple streams * Better handling of NCCL graph mixing * Fix cfg * nit * Move the seed setting * Reworked overall to have accuracy scoring * Adding tests 1/n * Added tests * Style * Fixes * CC review * Nits * Renames * Small fixes * Move distributed stuff to a distributed file * Docstring * Final fixes * Review compliance * Review compliance 2 * Rebase fix * Style * Less redudant testing suite * Fix TP plan * Fix stopping condition * Nits
* TP heads and DP / TP seeds * Reproducible hash * Add the notion of TP drivers * Fix NCCL device * Temporary fix for multiple streams * Better handling of NCCL graph mixing * Fix cfg * nit * Move the seed setting * Reworked overall to have accuracy scoring * Adding tests 1/n * Added tests * Style * Fixes * CC review * Nits * Renames * Small fixes * Move distributed stuff to a distributed file * Docstring * Final fixes * Review compliance * Review compliance 2 * Rebase fix * Style * Less redudant testing suite * Fix TP plan * Fix stopping condition * Nits
* TP heads and DP / TP seeds * Reproducible hash * Add the notion of TP drivers * Fix NCCL device * Temporary fix for multiple streams * Better handling of NCCL graph mixing * Fix cfg * nit * Move the seed setting * Reworked overall to have accuracy scoring * Adding tests 1/n * Added tests * Style * Fixes * CC review * Nits * Renames * Small fixes * Move distributed stuff to a distributed file * Docstring * Final fixes * Review compliance * Review compliance 2 * Rebase fix * Style * Less redudant testing suite * Fix TP plan * Fix stopping condition * Nits
* TP heads and DP / TP seeds * Reproducible hash * Add the notion of TP drivers * Fix NCCL device * Temporary fix for multiple streams * Better handling of NCCL graph mixing * Fix cfg * nit * Move the seed setting * Reworked overall to have accuracy scoring * Adding tests 1/n * Added tests * Style * Fixes * CC review * Nits * Renames * Small fixes * Move distributed stuff to a distributed file * Docstring * Final fixes * Review compliance * Review compliance 2 * Rebase fix * Style * Less redudant testing suite * Fix TP plan * Fix stopping condition * Nits
This PR adds support for TP in continuous batching. The major changes required to do this were:
hashwhich is salted depending on the processIt also adds a mechanism to the benchmark script to make sure the generation is coherent.
Performance
No perf regression, TP is faster.
Tests
Added tests for TP, all tests run.